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INTERFACE FOR PRESENTING INFORMATION 



FIELD OF THE INVENTION 



The present invention relates generally to displaying search results. More 
specifically, providing a visual or multi-media representation of search results is 
disclosed. 



A variety of techniques for identifying records in a database that are responsive to 
a query submitted by a user are well known. One well known application of such 
techniques is their use in providing an Internet search engine to identify potentially 
relevant pages on the World Wide Web (referred to herein as "web pages") in response to 
a query submitted by a searching party. 

It is well known that in order to be able to quickly identify web pages responsive 
to a query, one must first search tens of millions or hundreds of millions of the many 
millions of web pages accessible via the Internet and create a database containing 
information about each page. The information contained in such a database typically 
includes the address of the web page, such as the Uniform Resource Locator (URL) (i.e., 
the information a web browser would need to access the page) and one or more keywords 
associated with the page. The information in the database is used to identify web pages 
that may contain information that is responsive to a query submitted by a requesting 
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party, such as by matching a term in a search query to a keyword associated with a web 
page. 

A typical search engine presents results in the form of a list of responsive web 
pages. Each entry on the list typically corresponds to a web page, or a group of web 
5 pages from a single web site. Typically, a hypertext link is included for each web page 
listed. Text associated with the page also typically is provided, such as a brief 
description of the page, key words identified by the provider of the page, or excerpts of 
potentially relevant text that appears on the page. 

In some cases, an effort is made to rank the results using a ranking scheme that is 
10 intended to result in the most relevant responsive pages being displayed on the list first, 
i p In some cases, well known statistical techniques are used to group at least certain of the 

si 

p responsive pages together into clusters or categories of responsive pages. In at least one 

(i£ case, such categories are displayed to the requesting party in the form of a folder icon for 

IT? 

each category with an appropriate title or label on or near the folder icon. When a 
1 5 hypertext link associated with the folder icon is selected, the responsive web pages within 
the corresponding category are displayed in list form as described above. 

The approaches described above for displaying search results have a number of 
shortcomings. First, the use of text to provide an indication to the requesting party of the 
content of web pages responsive to a query requires the requesting party to read the text 
20 associated with each page and determine whether the text indicates that the web page 
may contain the information the requesting party is seeking. This process may be time- 
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consuming, depending on how long it takes the individual to read and comprehend the 
text provided for each responsive page and determine from the text whether or not the 
page contains the information sought, and how many such descriptions the individual 
must evaluate before the desired information is found or the individual either gives up or 
determines the search has not found any web page containing the desired information. 

A second shortcoming of the above-described approach is that the text may not 
provide an accurate or complete indication of the true content of the web page. Much of 
the information available on the World Wide Web is provided in the form of images such 
as still pictures, video, audio, animated GIF's or other multimedia content. A textual 
description or excerpts of text from the page may not provide an adequate indication of 
such content and, at best, is an inefficient and time-consuming way to represent such 
content. 

This second shortcoming has become even more apparent as increasing numbers 
of Internet users have gained access to broadband, high speed Internet connections, such 
as digital subscriber lines (DSL) and cable modem connections. The availability of such 
connections has accelerated the growth of multimedia content available on the Internet, 
increasing the need for an effective way to provide a representation of such content. 
Moreover, search engines that present search results in the form of a list of text entries do 
not take full advantage of the broadband connections now becoming available to an 
increasing number of users. Such connections make it possible to quickly and easily 
view search results displayed using a visual or multimedia representation of each site, 
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such as a collage or slideshow of images, one or more video clips, and/or one or more 
audio clips from or associated with the content of the site. 

Third, the approach described above can result in a tedious and potentially 
frustrating experience on the part of the requesting party. Reviewing a list of search 
results in the typical list form is much like reading a phone book or the entries in a card 
catalog. In many cases, a requesting party may review pages and pages of search results 
presented in such list form before the entry for the page having the desired information is 
found on the list. In some cases, the requesting party finds that the search has not 
identified a page having the desired information only after significant time has been spent 
reviewing search results in list form. 

Finally, the approach described above results in a display that is static and not 
aesthetically pleasing. Many users are attracted to the Internet because of the visual, 
multi-media, and dynamic content available on the World Wide Web. Many users 
accustomed to such dynamic content find the typical search result list display described 
above to be both unfamiliar and uninteresting compared to other methods of displaying 
information on the World Wide Web. 

It is critical to many providers of search engines that users find the site to be an 
interesting and aesthetically pleasing experience, as well as a useful and efficient way to 
find information. Search engine providers want to maximize the likelihood that a user 
will return to their site for further searches in the future. Advertising provides the only or 
most significant source of revenue for many such providers, and advertising revenue 
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typically is based on the number of viewers, or "impressions", a site receives. As a 
result, search engine providers depend heavily for their commercial success on their 
ability to attract users to their site. 

Search engines have been provided to locate images, video, music, and other 
5 multi-media content on the Internet. The image, video, and/or music search engines 
provide by companies such as AltaVista™, Lycos™, and Ditto™ are typical. In some 
cases, the results of such searches have been presented in a form other than a list of web 
pages. In some cases, a thumbnail image of each responsive image retrieved from a 



database of images, such as images previously located on pages on the World Wide Web, 
10 is presented. However, in such cases the thumbnail image is used to represent the full- 



A visual interface has also been used to enable a user of the Internet to maintain a 
live HTML connection with more than one web site at a time by displaying multiple 
1 5 active web pages on a single display. Again, this technique has been used only to provide 
a split screen view for an Internet browser, and not to present a visual representation that 
quickly apprises a viewer of a display of the nature and content of a web page, such as a 
web page that is responsive to a search query. 

It is also known to employ an advertising agency, graphical artist, or the like to 
20 create a set of images to be displayed in a slide show, such as in the banner 

advertisements that are ubiquitous on the Internet, to advertise a company, product, or 
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size image itself, not a web page the content of which is represented by the image, such 



as a web page that is responsive to a search query. 



O 



service. In some cases, a link is provided in the banner ad to a web site associated with 
the company, product, or service. However, such slide shows have been used only to 
provide an advertising message or an inducement to attract users of the Internet to a web 
site associated with the company, product, or service being advertised. Such slide shows 
have not been used to our knowledge to provide a visual representation of the actual 
nature and content of a web page, such as a web page that is responsive to a search query. 

Finally, it is known to provide for visual navigation through a site by enabling a 
user to select icons or images on one page in order to access additional or different 
information on another page. However, to our knowledge a visual interface has never 
been used to present the results of a search by providing a visual representation of web 
pages or categories of web pages, such as web pages or categories of web pages that are 
responsive to a search query. 

Therefore, there is a need for a way to display search results in a manner that 
enables users to find records, such as web pages, having the information they are seeking 
quickly and efficiently. In addition, in the Internet environment there is a need for a way 
to display search results that makes use of the visual and multi-media content available 
on the World Wide Web. There is also a need to present search results in a way that is 
familiar and more satisfactory to users of the Internet. Finally, there is a need to present 
search results in a display that is dynamic, rather than static. 
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SUMMARY OF THE INVENTION 

Accordingly, an interface for presenting search results is described. Responsive 
records are identified in response to a search query. Responsive records are grouped into 
categories of related responsive records, with a multimedia representation - such as a 
5 visual representation comprised of one or more images, animations, video segments, 
audio segments, or other multimedia content - being provided for each category. A 
multimedia representation of the nature and content of each responsive record within 
P each category also is provided. 

0] It should be appreciated that the present invention can be implemented in 

?fj 10 numerous ways, including as a process, an apparatus, a system, a device, a method, or a 
" : computer readable medium such as a computer readable storage medium or a computer 

y{ network wherein program instructions are sent over optical or electronic communication 

jf* links. Several inventive embodiments of the present invention are described below. 

In one embodiment, a lexicon embodying information concerning words, phrases, 
1 5 and expression; their meaning; and their semantic and conceptual relations with each 
other is built. A database of images is collected. A database of pre-determined, or 
"static", search result categories is developed. One or more images is associated with 
each static category. Web pages on the World Wide Web are accessed. Each page is 
processed to identify a signature for the page and to harvest usable images from the page. 
20 Web page signatures and usable images are stored in a database. One or more images are 
associated with each web page. When a search query is received, Web pages responsive 
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to the search query are identified. Responsive web pages are organized into categories of 
related responsive web pages. For each category and each responsive web page, one or 
more associated images are retrieved. The categories and responsive web pages are 
ranked. A display is provided to the requesting party in which one or more of the search 
5 result categories are represented by one or more associated images. By selecting a 
category, the requesting party accesses a display presenting one or more responsive web 
pages within the category. 

Each responsive web page within a category is represented by one or more images 
associated with the web page. If one image is used, the display is static. If more than one 



jg 10 image is used, the display is dynamic and the images alternate. In one embodiment, more 

|JLJ 

yj than one image is used to represent each responsive web page and the images are 



arranged in a slideshow format. 

In one embodiment, at least certain of the categories and/or certain of the 
responsive web pages are represented by one or more segments (or "clips") of video, 
1 5 audio, and/or other multimedia content. In one embodiment, at least certain of the 
responsive web pages are represented by one or more segments of video, audio, and/or 
other multimedia content harvested from the responsive web page. 

In one embodiment, the disclosed interface is used in connection with a directory 
of information sources, such as the Open Directory Project on the Internet, to represent 
20 directory entries and categories of entries. 
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In one embodiment, a tag is used by the provider of a web page to identify the 
image(s), video, audio, or other multimedia content on the web page that the provider 
considers to be the most relevant for purposes of representing the nature and content of 
the web page. In one embodiment, a different tag is used for each type of multimedia 
content (e.g., one for each of static images, video, audio, etc.) 

In one embodiment, a system and method are disclosed for presenting 
information. Categories are determined for found information by analyzing the content 
of the information. The categories are correlated with images that represent the 
categories. Images are displayed that correspond to the categories. 

In one embodiment, a system and method are disclosed for presenting 
information. Textual content of the information is analyzed. The textual content is 
associated with image content. The image content is displayed to illustrate the 
information. 

In one embodiment, a system and method are disclosed for building enriching 
content for a video presentation. Metadata related to the presentation is analyzed. 
Content is associated with the video presentation based on the analysis. The content is 
presented along with the video presentation . 

These and other features and advantages of the present invention will be presented 
in more detail in the following detailed description and the accompanying figures which 
illustrate by way of example the principles of the invention. 



Attorney Docket No. SIFTP001 



9 



PATENT 



Brief Description Of The Drawings 

The present invention will be readily understood by the following detailed 
description in conjunction with the accompanying drawings, wherein like reference 
numerals designate like structural elements, and in which: 

Figure 1 is a block diagram illustrating a system used in one embodiment to 
provide a visual representation of search results. 

Figure 2 is a flowchart of a process used in one embodiment to provide a visual 
representation of database search results in response to a user query. 

Figure 3 is a block diagram illustrating the organization of a database 300 stored 
in database 106 of Figure 1 in one embodiment. 

Figure 4 is a process flow showing in more detail a process used in one 
embodiment to implement step 204 of Figure 2. 

Figure 5 is a flowchart illustrating a process used in one embodiment to process 
web pages as described in step 206 of Figure 2. 

Figure 6 is a flowchart illustrating the process used in one embodiment to 
implement step 208 of Figure 2. 

Figure 7 is a flowchart illustrating a process used in a one embodiment to 
implement step 210 of Figure 2. 
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Figure 8 is an exemplary search result categories display 800 used in one 
embodiment to display exemplary search result categories for a hypothetical search using 
the word "heart" as the search query. 

Figure 9 is an exemplary responsive web pages display 900 used in one 
embodiment to implement step 708 of Figure 7. 
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rm « i?n INSCRIPTION 

A detailed description of a preferred embodiment of me invention is provided 
Mow. While the invention is described in conjunction with that preferred embodiment, 
i.should be understood that the invention is no, limited to any one embodiment. On the 
5 contrary.thescopeofthetaventionislimitedonlybytheappendedclaimsandthe 

invention encompasses numerous alternative,, modifications arrd equivalent 

purpose of example, numerous specific details are set form in the following description ,» 

order to provide a thorough understanding of the present invention. The present 
mventionmaybepracticedaceordingtotheclaimswimou. some or allofthese specific 

,0 details. F «, to ,^.f.■^.«^■^«^ hte "^ M^, 

relatedtomeinventionhasnotbeendescribedindetailsothatmepresen, inventions 

not unnecessarily obscured. 

Figure 1 is a block diagram illustrating a system used in one embodiment to 
provideavisualrepresentationofsearchresult,. One or more users ,02 connect via the 

,5 Intemetwithasearchenginewebsitesystem 100 used to provide a search engine web 
sitebymeansof computer system 104 and database 106. h, one embodiment, computer 
system 104 comprises a super computer comprisedof multiple computer processors arrd 
adequate memory, data storage capacity, and Interne, bandwrdtir ,0 provide search engine 
services via the mtemet to multiple users simultaneously. In one embodiment, computer 

20 system 104 is configured to provide a web page via the Internet and to receive and 

process searchqueries received from use, viathe web page. The computer system 104 
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is connected to database .06 and is configured to store data in database 106 and to 
retrieve data stored in database 106. 

m one embodiment, computer system 104 is comprised of at leas, two computers. 
Onecomputeris configured asafront end web server configured to provide a web page 
5 ™ the interne, capable of receiving search queries from users v,a the Internet. The font 
end web se^er performs the special task of presenting web pages to users and acting 
as an interface or conduit for information between the separate computer or computers 

web site, on the other hand. In such an embodiment, the logic fi.nc.ions necessary to 
W processandprovideresultsforsearchoueriesareperformedbyoneormoreadditiona. 

compute, configured as business logtc servers. The fron. end web servers maintain a 
direct cormecfiontomemtemetandaconnecfiontomebusinesslogicserveror servers. 

The business logic servers, in turn are connected to database 106 and arc responsible for 
storing information to database 106 and relieving information from database 106 to be 
, 5 prised by the business logic servers and/or to be provided to users via the fiont end 

web server(s). 

The search engine website system 100 also is connected via the Internet to a 

plurality of web pagesllO, denominated as web page, through web page. Migurel. 
G.ven.henumberofwebpagescur.ntlyavailableonthe.nteme^enumberofweb 

20 pages ft* may be accessible via a search engine such as one provided by search engine 
websi.esys.em.OOmaybeontheorderof.ensofmr.Uonsorhundredsofmi.lionsof 

web pages. 
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In order to be able to process search queries and identify responsive web pages, 
the computer system 104 is configured to access the web pages 1 10 in advance of 
receiving search queries from users 102 in order to build a database of information 
necessary to identify web pages that are responsive to a search query and provide an 
5 efficient, useful, and visual representation of the search results. The computer system 
104 may access the web pages 1 10 using any one of a number of readily available tools 
to perform that task, such as commercially available web crawler products that contain 
computer instructions necessary for a computer system such as computer system 104 to 

•jus; 

J access a large number of web pages systematically by crawling from one page to the 

m 10 next, and so on. As each web page is accessed, information about the web pages is 
y gathered and processed as described more fully below. The information gathered about 

£ i : 

SH the web pages 1 10 is stored in database 106 by computer system 104. 

y 

M= Figure 2 is a flowchart of a process used in one embodiment to provide a visual 

P representation of database search results in response to a user query. The process begins 

^ 15 with a step 202 in which a lexicon and an image database are built. The lexicon 204 
comprises a mapping of words, phrases and idiomatic expressions used in a given 
language, and their semantic, logical, and conceptual relationship to one another. In one 
embodiment, the lexicon 204 includes a mapping of collocations, i.e., the frequency with 
which words appear together in a language. Statistical natural language processing 
20 techniques for developing such a lexicon are well known in the art of linguistics. See, 
e.g., Automatic Text Processing: The Transformation, Analysis and Retrieval of 
Information by Computer, by Gerard Salton (Addison Wesley Publisher Co., reprinted 
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Erlbaum Assoc. 1991), which are hereby incorporated by reference for all purposes. 

, Tnelexrconisderivedfromacorpusoflanguageconte... Thecorpusis 

eompHsedofave^arsebodyofco-drawn^awidevarrery of source, The 

^us.ayiuc.uderawcon.en.^rronrsourcessuchaseucyc.oHias.-spapers, 

10 naiurallan^eprocessinsalso^avai*. Some such corpora include tags or 
— drar.aybeuse.li.build.salexicou.suchas.assrel^.oseurence 

stro cture and tags identifying of speech. — statistical natural language 

1 -.j t„ th. ramus to build a lexicon to be used by search 
processing techniques are applied to the corpus 

engine website system lOOofFigure 1. 

■none embodiment, the images database is comprised of hnages drawn from web 

^mmerciallyforuseasc.ipart.lnone.mbodrment.rmagesgene.tedbygrapmcal 
aesigne.orartis.formeexpresspu^seofbeingincludedmmeimagesdatabasealso 

20 areincluded. In one embodrment, one or more images m dte database are modified by 
addingatitle, caption, or ticker associated wim the image. Such metadata tha, rs 
Mudedwiththeirnagecanbeused.ohelpdetemtineasipatoreforeachimage.The 



15 
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image signature identifies the words, phrases, expressions, and concepts the image may 
be useful in representing. The image signature is stored. The image signature may be 
derived by noting the context of the page in which the image is displayed and assuming 
that the image is relevant to that context. The process continues with step 204 in which a 
5 database of static categories is created. As described more fully below, the static 

categories will be used, when suitable, to organize database records identified in response 
to a search query into categories for more convenient and efficient review by the user. 
The term "static" categories refers to the fact that the categories developed in step 204 are 

0 

k Q created in advance and do not change in response to a particular query or set of search 

jjs 10 results. As described more fully below, in one embodiment additional or different 

t g 

W categories are created dynamically in some circumstances, such as where the responsive 

w 

ff 1 records cannot be grouped into a reasonable number of static categories that accurately 

„ y described the content of the records in the group. 

a . 

^ Next, in step 206, individual web pages are processed to develop a signature for 

u 

1 5 each page. The signature embodies information concerning the identity, location, nature, 
and content of each of the web pages 1 10 to be included in the database. In addition to 
developing a signature for each page, the images contained in each page are evaluated 
and, if usable, are added to the images database and associated with the web page as an 
image suitable for providing a visual representation of the content of the page. If no 

20 image, or an insufficient number of images, taken from a web page is identified as 
suitable for providing a visual representation of the content of the page, other images 
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from the images database, or a picture of the web page itself as viewed by a browser, are 
associated with the page. 

Then, in step 208, a search query is received from the user and processed. 
Responsive web pages are identified and grouped into appropriate static and/or dynamic 
5 categories and result categories and responsive web pages within each category are 
ranked, as described below. 

Finally, in step 210, search results are displayed to the user using a visual 
representation described more fully below. 



Figure 3 is a block diagram illustrating the organization of a database 300 stored 
m 10 in database 106 of Figure 1 in one embodiment. The database includes a corpus database 
a 302 used to store the corpus described above. The database also includes a lexicon 304, 

I** built using the corpus, as described above. The third component of the database 300 is 

if] the images database 306. 

The database 300 also includes a categories database 308. Finally, the database 
15 300 includes a web page signatures database 3 10 in which the signature of each web page 
and an identification of the image(s) associated with the web page are stored. 

Figure 4 is a process flow showing in more detail a process used in one 
embodiment to implement step 204 of Figure 2. The process begins with a step 402 in 
which a database of static search categories and associated subcategories is built. An 
20 effort is made to anticipate the topics, types of search, and types of information users may 
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be interested in Ming by means of series submitted to the search engine website. The 
We- described above is used in one embodiment to deveiop categories and assocated 
subcategories that may be useful in presenting search results. In one embodiment, me 
to, is used to identify words, phrases, and/or expressions mat have a Cose semanhc 
5 or concept relationship with a word or combination of words ma, i. is anticipated may 
be included in a query. These related words, phrases, and/or expressions are then stored 
as static categories and subcategories associated with the word or combination of words. 

Next, in step 404, a, .east one image from the image database is associated with 
each category or subcategory stored in the category database. As noted above, when 
10 images are stored in the image database, information about the image and the words and 
concepts the image may be appropriate to represent a,so are stored tn the database. Th.s 
information is used to match nhages from the database witb corresponding categories and 
subcategories in the category database so mat an image may be used to provide a vsuai 
representation of the category to a user. 
, 5 Figure 5 is a flowchart illustrating a process used in one embodiment to process 

W eb pages as described in step 206 of Figure 2. Each step in the flowchart shown in 
Figure 5 is performed with respect » each web page accessed in the manner descnbed 

page is accessed. Next, in step 504, the page is analyzed to generate a signature for the 
2 „ page This process includes the application of well brown statistical natural language 
processing techniques to the text content of the web page to identify the words, subjects, 
and concepts that are the primary, or a secant, focus of the content of the page. 
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In addition, the HTML (hypertext markup language) or other computer code used 
to display the web page to those accessing the web page is analyzed to extract 
information about the page that may not be available from the text content of the page 
itself. For example, computer programming languages such as HTML provide a way to 
tag information in the code, such as to indicate the meaning, nature, or significance of the 
information. A standard setting body establishes standards for the use of such tags to 
annotate the code. One well known application of such tags is the use of a tag to identify 
keywords that the providers of the page believe describe the nature and content of the 
page. Such keywords may be used, in addition to information derived from the natural 
language processing techniques referred to above, to develop a signature for the page. 
The signature will later be used to identify pages responsive to a query from a user. 

The process continues with step 506 in which the images included in the web 
page are identified and evaluated. In one embodiment, all GIF and JPEG files on a web 
page, and all code associated with such files, is evaluated. GIF and JPEG files are 
commonly used to provide graphical images on web pages. In one embodiment, an 
automatic parsing algorithm is used to determine whether each image on a web page may 
be suitable to be added to the images database, either for use in representing a category or 
subcategory of information, or to be used to provide a visual representation of the content 
of either the page from which it is harvested or another web page that contains 
information related to the image but that does not itself have images suitable for use in 
representing the page. The properties of each image that are evaluated include the 
location of the image within the page, whether the image has a subject or title associated 
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with it, the way the image is referred to in the text on the web page, and the size of the 
image and its associated computer file. For example, an image that is relatively large, 
centrally located, and annotated with a title or caption that correlates with the signature of 
the page may be selected as an image suitable for representing the content of the page. 
By contrast, an image that is small, has no text associated with it, and appears on the 
bottom or periphery of the web page may be rejected. 

In step 508, images on the page that may be usable to represent either a search 
category, the page itself, or some other page are harvested from the page and stored in the 
images database. As noted above, a signature for the image also is stored. 

Next, in step 510 the overall appearance of the web page itself is evaluated to 
determine whether a picture of the entire web page should be captured and stored in the 
database. For example, a web page that contains a large image or several images closely 
related to the signature for the web page may be represented visually by a reduced size 
image of the entire web page. Products and services for obtaining such reduced size 
images of entire web pages are available commercially, including products and services 
that provide a GIF capture of a target web page. 

In one embodiment, the above-described techniques for identifying images in a 
web page that may be suitable for providing a visual representation of the web page are 
replaced or augmented by enabling providers of web pages to identify the images on the 
page that the provider believes are the most relevant or useful. For example, providers 
could be provided with a way to tag the HTML or other code used to provide the page in 
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a manner that identifies the image or images on the web page that the provider of the web 
page believes are the most relevant or important images on the page, or the ones most 
suitable to be used to provide a visual representation of the page such as to present search 
results. A standard for such tagging of images has not yet been provided, but could 
5 readily be established by the standard setting bodies for languages, such as HTML, that 
are commonly used to provide web pages. For example, such a standard could easily be 
modeled on the standard that currently enables providers of web pages to identify 
keywords for a web page. 

The process shown in Figure 5 concludes with step 512 in which one or more 
10 images form the images database are associated with the web page. Preferably, the 
images associated with the web page will be images harvested from the page itself. 
However, in cases where the web page itself did not have a sufficient number of images 
suitable for use in providing a visual representation of the page as a search result, as 
described above other images from the images database having a signature or description 
1 5 that matches the signature of the page may be drawn from the images database to be 
associated with the web page for future use in providing a visual representation of the 
page. 

In one embodiment, a score is assigned to the web page and stored in the web 
page signature database to provide an indication of the extent to which the page contains 
20 high quality images and/or other media content that is relevant to the main information 
contained in the page. In one embodiment, this assessment of the visual and/or 
multimedia content of each web page is used, among other factors, to determine a relative 
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ranking for each web page identified as responsive to a query. Using this approach, web 
pages that are rich in visual and/or multi-media content are more likely to receive a 
higher ranking and, therefore, to appear in one of the first several layers or pages of 
search results presented to the requesting party. In many cases, this approach will result 
5 in a search results display that is more visually interesting and familiar to the requesting 
party. 

Figure 6 is a flowchart illustrating the process used in one embodiment to 
implement step 208 of Figure 2. The process begins with step 602 in which a search 
query is received from a user. Next, in step 604 the query is analyzed to determine the 
10 words, phrases, expressions, and concepts most closely associated with the word or 

combination of words provided by the user in the query. Next, in step 606 the database 
of web page signatures is searched to identify web pages having a signature that matches 
in whole or in part the word or combination of words in the query. 

Then, in step 607, tentative search result categories are generated dynamically 
15 using collocations. That is, the lexicon is used to identify words or phrases that often 
appear together with one or more search terms or phrases. Next, in step 608, it is 
determined whether the categories generated based on the collocations are satisfactory. 
The signatures of the responsive web pages are searched to determine if the collocations 
are associated with a significant portion of the web pages such that the collocations 
20 provide a satisfactory means of grouping the results (e.g., by defining a manageable 

number of categories that include most of the web pages and with sufficient distribution 
of pages among the categories). 
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If the categories based on collocations are satisfactory, the process proceeds to 
step 614, in which the categories are ranked in terms of how closely they are related to 
the query. Also, the responsive web pages within each category are ranked within the 
category based on how closely the signature for each web page matches the query. 
Specific techniques for performing such ranking are well known in the art and are beyond 
the scope of this disclosure. 

If the categories based on collocations are not satisfactory, the process continues 
with step 609, in which an attempt is made to associate the responsive web pages with 
previously-defined categories from the categories database. In one embodiment, the 
categories most closely related to the signature for each web page are identified and 
assigned a weight indicating how closely the category matches the signature. The 
weighted static categories are then evaluated in step 610 to determine if the responsive 
web pages can be grouped within a reasonable number of static categories that will both 
encompass a sufficient number of the web pages and describe the nature and content of 
the web pages within each group adequately. In one embodiment, the weighted static 
categories are evaluated to determine whether the responsive results may be represented 
adequately by from one to ten static categories. 

If the static categories do provide a satisfactory grouping and representation of the 
responsive web pages, the process proceeds to step 614 in which the categories and 
responsive web pages are ranked. If in step 610 it is determined that the matching of 
responsive web pages to static categories has not resulted in a satisfactory grouping and 
representation of the search results, the process proceeds to step 612 in which well known 
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statistical techniques are used to group the responsive web pages into clusters of related 
responsive web pages based on the signature of each page. Statistical natural language 
processing techniques are then used to generate a category name dynamically for each 
cluster. Then, the process proceeds to step 614, in which the dynamically generated 
categories are ranked and the web pages within each category are ranked, as described 
above. 

The process begins with step 702 in which images associated with the categories 
to be displayed are retrieved from the images database. Next, in step 704, a web page is 
generated to provide a visual representation of the result categories. Then, in step 706, 
the images associated with the web pages to be presented as search results are retrieved 
from the images database. Finally, in step 708, one or more web pages are generated to 
provide a visual representation of the responsive web pages within each category. 

Figure 8 is an exemplary search result categories display 800 used in one 
embodiment to display exemplary search result categories for a hypothetical search using 
the word "heart" as the search query. As shown in Figure 8, the search result categories 
display 800 is divided into a 3 x 3 grid of 9 cells. The center cell 802 contains an image 
of a question mark and the text of the search query, in this case the word "heart". The 
remaining 8 cells of the grid, cells 804a-804h, are used to provide a visual representation 
of the eight top ranked search result categories. The exemplary categories shown in 
Figure 8 include the categories "aspirin", "heart disease", "nutrition", "surgery", "card 
games", "physiology", "romance", and "exercise". In each of cells 804a-804h, the name 
of the category displayed in the cell is listed at the bottom of the cell and an image that 
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provides a visual representation of the result category is displayed in the cell above the 
category name. The search result categories display 800 also includes a button 806 
which, when selected, will result in the next eight categories by rank (or the remaining 
categories, if less than eight remain) being displayed in the search results categories 
display 800. While the exemplary categories display 800 presents eight categories at a 
time, it is readily apparent that any number of categories may be displayed at one time, 
and that geometries other than the 3 x 3 grid geometry show in Figure 8, such as a hub 
and spoke arrangement, can be used. 

The search results categories display 800 provides an efficient and aesthetically 
pleasing way for the user to find and access the responsive web pages that are most likely 
to contain the information the requesting party is seeking. For example, a requesting 
party interested in the latest information available about the benefits and risks of taking 
aspirin as a preventive measure prior to the onset of heart disease would be drawn 
quickly to the image of a bottle of aspirins and several aspirin tablets displayed in cell 
804a of Figure 8. The requesting party likewise would be able to quickly filter out 
wholly irrelevant information, such as web pages grouped under the category "romance", 
by recognizing that the image of the heart shape with an arrow through it is an image 
related to the heart as a symbol of romantic love, and not a health-related concept. 

Figure 9 is an exemplary responsive web pages display 900 used in one 
embodiment to implement step 708 of Figure 7. The responsive web pages display 900 
shown in Figure 9 is a continuation of the example described above with respect to 
Figure 8 in which the user has selected the category "aspirin". The responsive web pages 
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display 900 is divided into a 3 x 3 grid of 9 cells, similar to the display 800 in Figure 8. 
The center cell 902 contains the same question mark image as center cell 802 in Figure 8. 
The text that appears beneath the image in center cell 902 indicates that the responsive 
web pages display 900 is being used to display web pages responsive to a query 
comprised of the search term "heart" that have been grouped within the category named 
"aspirin". The text also indicates that the display is being used to show eight of ten 
responsive websites in the category being displayed. 

In the outer cells 904a-904h, each cell is used to provide a visual representation of 
one of the eight top ranked responsive web pages within the category "aspirin". In one 
embodiment, a single representative image previously associated with each web page 
appears in the cell corresponding to the responsive web page. In one embodiment, 
multiple images are associated with each web page in the database and an animated slide 
show of images associated with the web page is presented for each web page displayed. 
As shown in Figure 9, in one embodiment, text appears beneath the image or images 
displayed for each web page describing the nature, location, source, and/or content of the 
responsive web page. 

The responsive web pages display 900 also includes a more pages button 906 
which, when selected, results in the next zero to eight responsive web pages being 
displayed. In the case illustrated in Figure 9, only two additional websites within the 
category "aspirin" would be displayed. 
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In one embodiment, the slide show images are rotated at relatively slow intervals 
when the cursor is not on a particular one of cells 904a-904h and the pace of the slide 
show accelerates appreciably when the cursor is placed on a particular one of cells 904a- 
904h. This permits the requesting party to quickly view the set of images associated with 
a particular responsive web page by placing the cursor on the slide show for that page. 

The above-described visual representation of search result categories and 
responsive web pages enables users to find desired information more quickly and 
efficiently by using a visual interface, which is much more familiar to users of the 
Internet than the traditional list approach. In addition, the slide show approach is 
advantageous because it enables a requesting party to do the equivalent of flipping 
through pages of a book or magazine on a bookshelf in a bookstore. By viewing the slide 
show, a requesting party can quickly get a sense of the nature of a web page and the 
content the user will find if the user accesses the page. By contrast, when search results 
are presented in a list or folder format, a requesting party must spend time reading a 
written description of each web page that may or may not provide an accurate indication 
of the content of the web page. Furthermore, the above-described approach saves on the 
number of mouse or other pointer "clicks" needed to review search results and find 
information, as a user can in many cases get more complete information regarding the 
multimedia content of a page without actually visiting the page. 

It should be noted that while the above detailed description focuses on a particular 
embodiment in which images are used to provide a visual representation of search result 
categories and responsive web pages, it is contemplated that the approach described 
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above will be used with other forms of content available in sources of information such 
as the Internet. For example, there is a wealth of video content available on the Internet. 
Such video content could be accessed, evaluated, and harvested in the same manner as 
described above for static images. Harvested video could be associated with search result 
categories and web pages as described above with respect to the static images, and used 
in displays similar to those shown in Figure 8 and 9 to represent search categories and 
responsive web pages respectively. 

In such a video embodiment, segments of video would be selected to represent 
search result categories or responsive web pages in the same manner as described above 
for static images. The video clips would then be presented in reduced form in the same 
manner as shown in Figures 8 and 9. Such video clips would have the same advantage as 
static images, presented either singly or in a slide show as described above, in permitting 
a requesting party to quickly determine which categories of information and which 
responsive web pages within categories of interest are most likely to contain the 
information the requesting party is seeking. Audio clips likewise can be used to provide 
a multimedia representation of the nature and content of a web page in the same manner 
as described above with respect to images and video. 

While the above description focuses on an embodiment in which the database 
being searched is a database of web pages available via the Internet, the approach is 
equally applicable to presenting search results in response to a query of any database of 
information in which the database records may be represented by an associated image or 
set of images. Contemplated applications include interactive television applications. For 
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example, a viewer of a sporting event on television may be provided with a cursor or 
other pointing device to be used to select images on the screen concerning which the 
requesting party would like to retrieve additional information. Alternatively, a viewer 
may be provided with a means for entering a search query in the form of text related to a 
program the viewer is viewing. In either case, a visual representation of search results 
such as those shown in Figures 8 and 9, and described above would be an advantageous 
and visually pleasing way to present search results on the television screen to such a 
viewer. 

In another interactive television embodiment, a database of information is 
accessed to provide a parallel presentation to a television broadcast or video presentation. 
Information about the broadcast is derived by either analyzing the broadcast or metadata 
associated with the broadcast such as a datacast and querying the database based on what 
is being broadcast to find and present information that is related to the broadcast. For 
example, close caption information associated with the broadcast may be used to 
determine the broadcast content and search for related material. 

In other embodiments, the search techniques described above may be used to 
search for and present material included on a DVD or other medium in addition to 
material found on the Internet. 

Although the foregoing invention has been described in some detail for purposes 
of clarity of understanding, it will be apparent that certain changes and modifications may 
be practiced. It should be noted that there are many alternative ways of implementing 
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